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Abstract 

In trials with binary outcomes, assessed repeatedly at pre-specified times and where the subject is considered to 
have experienced a failure at the first occurrence of the outcome, interim analyses are performed, generally, after 
half or more of the subjects have completed follow-up. Depending on the duration of accrual relative to the length 
of follow-up, this may be inefficient, since there is a possibility that the trial will have completed accrual prior to the 
interim analysis. An alternative is to plan the interim analysis after subjects have completed follow-up to a time that 
is less than the fixed full follow-up duration. Using simulations, we evaluated three methods to estimate the event 
proportion for the interim analysis in terms of type I and II errors and the probability of early stopping. We 
considered: 1) estimation of the event proportion based on subjects who have been followed for a pre-specified 
time (less than the full follow-up duration) or who experienced the outcome; 2) estimation of the event proportion 
based on data from all subjects that have been randomized by the time of the interim analysis; and 3) the 
Kaplan-Meier approach to estimate the event proportion at the time of the interim analysis. Our results show that 
all methods preserve and have comparable type I and II errors in certain scenarios. In these cases, we recommend 
using the Kaplan-Meier method because it incorporates all the available data and has greater probability of early 
stopping when the treatment effect exists. 
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Background 

Interim analyses that permit early stopping of a ran- 
domized controlled trial (RCT) for extremely positive 
results or for futility are included in the design for eth- 
ical and economic reasons. Strategies have been devel- 
oped for interim analyses such that the overall type I 
error of the entire trial is preserved at a fixed level 
(Haybittle 1971; O'Brien and Fleming 1979; Peto et al. 
1976; Pocock 1977). 

Often, the primary outcome is MAhether or not a sub- 
ject experienced an event over a fixed period of time 
T, In some trials, the outcome is assessed repeatedly 
at pre-specified times during follow-up, and the subject 
is considered a failure if the event occurs at any time. 
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For example, in a cardiovascular RCT investigating the 
effect of an intervention for preventing post-thrombotic 
syndrome, subjects can be assessed every 6 months for 
up to 24 months using a disease-specific questionnaire 
(Enden et al. 2012; Vedantham et al. 2013). A failure has 
occurred if the questionnaire score exceeds a pre-specified 
threshold. Another example would be a breast cancer 
radiotherapy RCT where adverse cosmesis (i.e. a dichot- 
omy), assessed at 1, 3 and 5 years post-randomization, 
would be the primary safety outcome and the focus of 
the interim analysis. 

Interim analyses are generally performed after half or 
more of the subjects have completed follow-up (Pedley 
2011). Depending on the duration of accrual relative to 
the length of follow-up, this strategy may be inefficient 
because it is possible that accrual will have been com- 
pleted and patients will have finished treatment prior to 
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the interim analysis. If, however, the interim analysis was 
done earlier and a statistically significant effect was 
found, the trial may be stopped, and all future subjects 
would receive the experimental therapy. 

In this situation, one alternative is to plan an interim 
analysis after a smaller percentage of subjects have com- 
pleted full follow-up. However, there is a low probability 
of terminating the trial early when the interim analysis is 
based on so little information, and, therefore, such an 
analysis would unnecessarily spend alpha (Togo and 
Iwasaki 2013). A second alternative is to plan the in- 
terim analysis after half or more of the subjects have 
completed a specified portion of the follow-up where 
R<T, and T is the fixed full follow-up duration for each 
subject. 

Several researchers have studied methods that com- 
bine data from subjects who have completed full follow- 
up with those who have been followed for duration R in 
situations where the outcome is reversible (Marschner 
and Becker 2001; Sooriyarachchi et al. 2006; Whitehead 
et al. 2008). In our research, however, the situation is dif- 
ferent in that the outcome can be ascertained at any of the 
pre-specified visits during follow-up and is irreversible. 

In this paper, we consider 3 methods of estimating the 
interim event proportion (risk) for each treatment group 
in an RCT for an interim analysis: 1) estimated event 
proportion based only on subjects who have been 
followed for at least duration R or who had an outcome 
event; 2) the event proportion based on data from sub- 
jects that have been randomized by the time of the in- 
terim analysis, and 3) the Kaplan-Meier approach to 
estimate the event proportion. We investigate the effect 
of each method on the type I and II errors and the prob- 
ability of early stopping through computer simulation of 
various trial scenarios. 

Methods 

Consider a trial designed to detect an absolute risk reduc- 
tion (ARR) between the standard group (ttq) and the ex- 
perimental group (hi) over the time period 0 to T using a 
normal approximation Z-test with 

/ tti{1-jti) I 7ro(l-7ro) 

where ttq and tti are the observed proportions, Hq and 
rij are the group sample sizes, and we are testing the 
one-sided hypotheses Hq: jti > ttq versus Hi: tti < ttq. Fur- 
thermore, we assume 90% power, an alpha of 0.025 and 
a 1:1 randomization. Since the normal distribution is 
symmetric, the p-value for a one-sided test is equivalent 
to half of the two-sided p-value. 

Suppose the trial requires 4 years for enrolment, each 
subject is followed for 2 years (i.e. r=24 months), and 



failures are ascertained at any of the four 6-monthly pre- 
specified visits post-randomization. Let the start of the 
trial (calendar time) be denoted by Tq, Following the no- 
tation in Table 1, let tj be the pre-specified visit times in 
the trial where tj < T and ; is the visit number where ; = 
0, 1, 2... /, and / denotes the number of visits (e.g. / = 4 
and to = 0, tj = 6, ^2 = 12, ^3 = 18, t4 = 24 months). Sup- 
pose an interim analysis is scheduled to occur when 50% 
of the subjects have completed R = 12 months (^2 = R) of 
follow-up which, assuming a uniform recruitment pat- 
tern, corresponds to approximately 36 months after the 
start of the trial, denoted by Ti (Figure 1). At the interim 
analysis, the proportion of subjects who fail in each 
group could be estimated using any of the following 
approaches. 

Method 1: event proportion based on subjects followed 
for at least duration R or who had an event 

In RCTs where the length of enrolment relative to follow- 
up is not an issue, subjects included in the interim analysis 
are those who have completed their full follow-up T or 
who have had an event prior to completion (Pedley 2011). 
A similar approach is used here whereby we include only 
subjects who have completed at least duration R (where 
tr = R, r refers to the visit at which follow-up time equals 
R) of their full follow-up T, or have had an event prior 
to this point. Since the interim analysis occurs after 50% 
of the subjects have completed at least follow-up of R, 
this approach includes the first 50% of enrolled subjects 
plus those subjects that have experienced an event but 
have not completed follow-up of R, For each treatment 
group / (0 = standard, 1 = experimental) at visit time tp 
let mij be the number of subjects at risk (i.e. have com- 
pleted visit at tj without having an event), and let eij be 
the number of new events diagnosed. Then the event 
proportion in treatment group / at the time of interim 
analysis Ti is given by: 

/ 

^ / N k=l 

^iin) = -J — 

mir + ^ eik 
k=l 

The individuals who have experienced an event but 
have not completed duration R of follow-up are included 
in the numerator and the denominator. 

Method 2: event proportion based on data from subjects 
that have been randomized by the time of the interim 
analysis 

This simple approach uses data from the subjects random- 
ized by the time of the interim analysis Ti (i.e. once 50% of 
the subjects have been followed for at least time R), Let rii 
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Table 1 Notation table for estimation of event proportions 



Visit number J 



Visit time tj 



Subjects at risk mj 



New events ej 



Incidence at visit j dj 



to (<6 m) 
ti (6 m) 
t2 (12 m) 
(18 m) 
t4 (24 m) 



mo 
mi 

1712 
17)4 



eo = 0 

64 



do = 0 
di =ei//T]i 
dj = 62/ mj 

d4 = 64/ m4 



be the number of subjects who have been randomized to 
treatment group /. Then the event proportion for each 
group at the time of interim analysis Ti is given by 



J 

k=l 



which is simply the total number of observed events 
divided by the number of subjects randomized by 
time Ti, 

Method 3: Kaplan-Meier approach 

This approach also uses all the data available at the time 
of the interim analysis Ti (i.e. once 50% of the subjects 
have been followed for at least time R), For individuals 
who have not completed follow-up time T (i.e. the full 
fixed follow-up duration) and have not had the event, 
they are simply right-censored at the latest time that 
they were observed. Then the Kaplan-Meier (KM) esti- 
mates can be calculated using all randomized subjects 



and the event proportion in treatment group / at the 
time of interim analysis Ti is given by 

ntin) = l-Si{T) 

where (T) is the KM survivor function estimate. Follow- 
ing the notation in Table 1, this is equivalent to 

/ 

k=l 

We evaluated these methods in terms of overall type I 
and II errors and the probability of early stopping of the 
trial for a positive result at the interim. The interim ana- 
lysis was performed using the Haybittle-Peto (Haybittle 
1971; Peto et al. 1976) and O'Brien-Fleming (O'Brien 
and Fleming 1979) monitoring boundaries for extreme 
positive results. These boundaries are conservative and 
require small p-values for early stopping of the trial. 
Other less conservative boundaries such as the Pocock 
approach were not evaluated (Freidlin and Korn 2009; 
Pocock 2005). 



-i® @ 
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^ @ 
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-@ 



<D 



<5h 



0 Visit time (months) 

1 I Visit with first event 



Tq Time of Interim Analysis 

Calendar Time 

Figure 1 Plot showing the follow-up time in months for 10 subjects and the proposed time for the interim analysis after 5 (50%) 
subjects have completed 12 months of follow-up. 
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Simulation 

We considered six RCTs similar to the trial described in 
the Methods section (see Table 2). Data for the binary 
endpoint were generated using the binomial distribution 
under the null and alternative hypotheses. 

For each subject with an event, the time at which the 
event occurred was randomly assigned to reflect five 
clinically-plausible scenarios (Table 3), using the following: 
1) events were distributed equally across the four time- 
points with probabilities (0.25, 0.25, 0.25, 0.25) for both 
groups; 2) the majority of the events occurred in the first 
two time-points with probabilities (0.35, 0.30, 0.20, 0.15) 
for both groups; 3) the majority of the events occurred in 
the last two time-points with probabilities (0.15, 0.20, 0.30, 
0.35) for both groups; 4) the standard group follows distri- 
bution (3) and the experimental group follows distribution 
(2); and 5) the reverse of scenario (4). Entry times for 
subjects over 48 months were randomly generated from 
a uniform distribution, and the interim analysis was car- 
ried out after 50% of the subjects completed R = 12 months 
of follow-up. We carried out 10,000 replications for each 
trial. Given that Z {x) and Z (y) are the interim and 
final test statistics, respectively, the type I error rate, 
Ph,{Z{x) >gor: [Z{x)<g andZ(3/) > /]), and the type II 
error, Ph^ {Z{x)<g and Z{y)<f), were obtained from data 
generated under the null and alternative hypotheses, 
respectively, where g and / are the interim and final 
critical values of the O'Brien-Fleming (^=2.797,/= 1.977) 
and Haybittle-Peto (g= 3.0,/= 1.967) monitoring boundar- 
ies. The probability of early stopping, Ph^{Z{x) > g), was 
obtained under the alternative hypotheses. All analysis 
was performed in R 2.15 (www.r-project.org). 

Results 

The results of the type I error rates for the three methods 
are shown graphically in Figure 2. The three methods have 
comparable type I error rates across each of the trials and 
event distribution scenarios. The methods in general have 
nominal or close-to-nominal type I error rates when the 
event distribution probabilities are equivalent between 



Table 2 Summary of six trials considered for simulation 
with P = 0.1 0 and a one-sided a = 0.025 



Standard group 
event proportion 

{no) 


Experimental group 
event proportion 

in,) 


Absolute risk 
reduction 

(TTo-TTi) 


N 


0.30 


0.25 


0.05 


3342 


0.30 


0.20 


0.10 


796 


0.30 


0.10 


0.20 


160 


0.50 


0.45 


0.05 


4182 


0.50 


0.40 


0.10 


1030 


0.50 


0.30 


0.20 


242 



Table 3 Summary of the event distribution probabilities 
for the simulated scenarios 

Scenario Event distribution probabilities by visit time 



tp ^2/ U 





Standard group 


Experimental group 


1 


0.25, 0.25, 0.25. 0.25 


same as standard 


2 


0.35, 0.30, 0.20, 0.15 


same as standard 


3 


0.15, 0.20, 0.30, 0.35 


same as standard 


4 


0.15, 0.20, 0.30, 0.35 


0.35, 0.30, 0.20, 0.15 


5 


0.35, 0.30, 0.20, 0.15 


0.15, 0.20, 0.30, 0.35 



treatment groups or when the experimental treatment 
group events occurred earlier in the trial compared with 
the standard group. However, under these same scenarios, 
slightly greater-than-nominal type I error rates are seen in 
the trials where {tto, jtj) = (0.30, 0.10) and (tto, ttj) = (0.50, 
0.45), where the type I error rates are approximately 0.03. 
For the scenario where the experimental group events oc- 
curred later in the trial compared with the standard group, 
the type I error was generally inflated for all methods. 

The three methods also have comparable type II error 
rates (Figure 3). In general, under all event distribution 
scenarios and trials, the type II error rates are comparable 
to the nominal value of 0.10 regardless of the interim ana- 
lysis method or stopping boundary rule. Moreover, in the 
scenario where the experimental group events occurred 
later in the trial compared with the standard group, the 
type II errors rates are much lower than the nominal value 
for the trials with ARRs of 0.05 and 0.10. 

Under the alternative hypothesis, methods 1 and 3 have 
comparable probabilities for early stopping in scenarios 
where the treatment groups have equivalent event distri- 
butions probabilities over time, specifically in the trials 
where ttq = 0.30 (Figure 4). Method 3 has a slightly greater 
probability of early stopping than method 1 in the trials 
where 7To = 050, Moreover, method 2 has the smallest 
probability of early stopping in scenarios where the treat- 
ment groups had equivalent event distributions probabil- 
ities over time. On the other hand, all methods have 
comparable probabilities of early stopping in the scenarios 
where the treatment groups had contrasting event distri- 
butions over time. The highest probabilities for early stop- 
ping are seen in the trials where the experimental group 
had a smaller proportion of events occur earlier in the trial 
compared with the standard group, and the lowest prob- 
abilities of early stopping are seen in the opposite scenario. 
In general, the probability for early stopping is greater 
using the O'Brien-Fleming boundaries compared with the 
Haybittle-Peto monitoring boundaries. 

Discussion 

In RCTs with binary endpoints, interim analyses are gener- 
ally conducted after a considerable percentage of subjects 
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Event Distribution Scenario 1 



Event Distribution Scenario 2 



Event Distribution Scenario 3 
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Event Distribution Scenario 4 
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Event Distribution Scenario 5 
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Figure 2 Overall type I error rates for each trial by event distribution scenario. 



— Haybittle-Peto - Method 1 

— Haybittle-Peto - Method 2 

— Haybittle-Peto - Method 3 

— O'Brien-Fleming - Method 1 
-- O'Brien-Fleming - Method 2 

— O'Brien-Fleming- Method 3 



have completed follow-up. However, under certain situ- 
ations this approach is not optimal since the trial may 
have completed accrual and all the subjects will have 
been treated by that time. We evaluated three approaches 
for an interim analysis when a considerable percentage of 
subjects complete a follow-up time that is less than the 
planned trial follow-up. 



We observed that the type I error rates were compar- 
able for all three methods. For most trials simulated, 
under the scenarios where the event distributions were 
equivalent between treatment groups or the experimen- 
tal group had events occur earlier than the standard 
group, the type I error rates were close to the nominal 
value. These results concur with those of Pedley (2011), 



Event Distribution Scenario 1 
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0.095 



=0.3 
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Absolute Risk Reduction 
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0.10 



0.05- 



= 0.5 




0.05 0.1 0.2 0.05 0.1 0.2 
Absolute Risk Reduction 



Event Distribution Scenario 3 



0.095- 



.7, =0.3 


^0=0.5^ 










1 I I 
0.05 0.1 0.2 


1 1 I 
0.05 0.1 0.2 



Absolute Risk Reduction 



- Haybittle-Peto - Method 1 

- Haybittle-Peto - Method 2 

- Haybittle-Peto - Method 3 

- O'Brien-Fleming - Method 1 

- O'Brien-Fleming - Method 2 

- O'Brien-Fleming - Method 3 



Figure 3 Overall type II error rates for each trial by event distribution scenario. 
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Figure 4 Probabilities for early stopping under the alternative hypothesis for each trial by event distribution scenario. 



who showed that conducting the interim analysis after a 
considerable percentage of subjects had completed full 
follow-up (using method 2) produced nominal type 1 
error rates, albeit in the situation where events could be 
measured at any time during follow-up and not just at 
specific time points. However, we also observed that the 
type I error rate increased with increasing absolute risk 
reduction for trials with a standard group event propor- 
tion of 0.3, thus resulting in slightly higher type I error 
rates for the trial with ARR to 0.20. In addition, similar 
slightly higher type I error rates were seen in the trial 
with a standard group event proportion of 0.5 and the 
ARR = 0.05. This is perhaps due to a combination of less 
variability and a small sample size for the former, and a 
large sample size and small ARR for the latter. There- 
fore, trialists should be cautious of using either of these 
methods under these situations. 

While there were situations in which the type I errors 
were slightly inflated with all methods, the methods 
performed much better with regard to the type II errors 
under all scenarios, suggesting that these methods will 
not have a negative effect on the power to detect the hy- 
pothesized difference between treatment groups provided 
the difference exists. Under the scenarios where the ex- 
perimental group had events occur later compared with 
the standard group, the methods showed increased overall 
power because the probability of early stopping was 
greater in these scenarios. However, under these sce- 
narios, the type I error rates are inflated. 

The methods differed on the probability of early stopping 
under the alternative hypothesis with method 2 having 



the lowest probability. This is because this approach in- 
cludes data from all subjects that have been randomized 
by the time of the interim analysis in the denominator 
of the estimation of the event proportion even though a 
subgroup of these patients would not have had any as- 
sessment of the outcome since they would not have 
reached their first time point for outcome assessment. 
The consequence is the dilution of the interim treatment 
effect leading to lower interim power. Method 3 also uses 
all available data from randomized subjects at the time of 
the interim analysis. However, it employs a conditional 
probability approach which differentiates between those 
subjects who have not yet had an assessment visit (i.e. cen- 
sored) and who are at risk at each assessment visit, thus 
yielding a greater probability of early stopping. Similarly, 
since method 1 uses only a subset of randomized subjects 
at the time of the interim analysis, the estimated interim 
treatment effect is less diluted and, therefore, has greater 
probability for early stopping than method 2. Conversely, 
since it uses a smaller number of subjects compared with 
method 3, the probability for early stopping is slightly 
lower than method 3 in trials where the standard group 
event proportion is 0.5, because the variability is greater 
for proportions closer to 0.5. Furthermore, we observed 
that the probabilities for early stopping are greater using 
the O'Brien-Fleming boundary compared with the 
Haybittle-Peto boundary since it is less conservative. 

Although the largest probabilities of early stopping 
under the alternative hypothesis and the smallest type II 
errors were seen under the scenario where the experi- 
mental group had events occurring later compared with 
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the standard group, the type I errors is greatly inflated 
and, therefore, none of the methods can be recommended 
in this situation. Since there is a delay in occurrence of the 
event in the experimental group, this may be perceived as 
an effect of treatment. However, in situations where inves- 
tigators are interested in the occurrence of an event over a 
fixed time period, this scenario, although rare, would still 
be considered under the null hypothesis. 

Our study had some limitations. The generalizability of 
our findings may be limited since we evaluated six trial 
scenarios with particular event distributions over time. In 
diseases where the event distributions over time differ 
from the ones evaluated in this research, further simula- 
tions would be required to evaluate these methods. Sec- 
ondly, we evaluated trials with one interim analysis after 
50% of the subjects completed 12 months of follow-up 
using the O'Brien-Fleming or Haybittle-Peto approach. 
These findings may not be applicable to trials in which in- 
terim analyses are required at multiple times or when 
using the alpha spending function approach to monitor 
the trial. Finally, the biases of the interim event propor- 
tions and treatment effects were not evaluated primarily 
because it is well known that estimators at the interim are 
biased, especially for estimators that allow for early stop- 
ping for positive results. However, further investigation on 
the estimators is needed. 

Conclusion 

Nonetheless, we have shown that under certain scenarios, 
conducting an interim analysis when a considerable num- 
ber of subjects have some follow-up data, using any of the 
methods, preserves the type I and II errors. Although all 
three methods preserve type I and II errors under these 
scenarios, we recommend using the Kaplan-Meier method 
because it incorporates all the available data and has 
greater probability of early stopping when the treatment 
effect exists. We have also shown that under certain sce- 
narios, none of these methods is suitable for an interim 
analysis, and trialists should be cautious when using them. 
Finally, when possible, an interim analysis should be 
undertaken when data from a considerable number of 
subjects who have completed full follow-up are available. 
However, if waiting for a considerable number of subjects 
to complete full follow-up is not an efficient approach, 
such as in the examples described, the methods outlines 
in this paper should be considered and evaluated to fit the 
specific needs of the trial. 
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